Gambling in a Rigged Casino: the Adversarial Multi-armed Bandit Problem 1 Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

نویسندگان

  • Peter Auer
  • Robert E. Schapire
چکیده

The present draft is a very substantially revised and expanded version which has been submitted for journal publication. Abstract In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-oo between exploration (trying out each arm to nd the best one) and exploitation (playing the arm believed to give the best payoo). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoos of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoos. In a sequence of T plays, we prove that the expected per-round payoo of our algorithm approaches that of the best arm at the rate O(T ?1=2), and we give an improved rate of convergence when the best arm has fairly low payoo. We also prove a general matching lower bound on the best possible performance of any algorithm in our setting. In addition, we consider a setting in which the player has a team of \experts" advising him on which arm to play; here, we give a strategy that will guarantee expected payoo close to that of the best expert. Finally, we apply our result to the problem of learning to play an unknown repeated matrix game against an all-powerful adversary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discrete versus Analog Computation: Aspects of Studying the Same Problem in Diierent Computational Models Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

In this tutorial we want to outline some of the features coming up when analyzing the same computational problems in diierent complexity theoretic frameworks. We will focus on two problems; the rst related to mathematical optimization and the second dealing with the intrinsic structure of complexity classes. Both examples serve well for working out in how far diierent approaches to the same pro...

متن کامل

Gambling in a Rigged Casino: The Adversarial Multi-Arm Bandit Problem

In the multi-armed bandit problem, a gambler must decide which arm of non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff...

متن کامل

Dynamically Adapting Kernels in Support Vector Machines Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

The kernel-parameter is one of the few tunable parameters in Support Vector machines, and controls the complexity of the resulting hypothesis. The choice of its value amounts to model selection, and is usually performed by means of a validation set. We present an algorithm which can automatically perform model selection and learning with no additional computational cost and with no need of a va...

متن کامل

Multiplicative Updatings for Support-vector Learning Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

Support Vector machines nd maximal margin hyperplanes in a high dimensional feature space. Theoretical results exist which guarantee a high generalization performance when the margin is large or when the number of support vectors is small. Multiplicative-Updating algorithms are a new tool for perceptron learning whose theoretical properties are well studied. In this work we present a Multiplica...

متن کامل

New Support Vector Algorithms Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150 Introduction 1

We describe a new class of Support Vector algorithms for regression and classiication. In these algorithms, a parameter lets one eeectively control the number of Support Vectors. While this can be useful in its own right, the parametrization has the additional beneet of enabling us to eliminate one of the other free parameters of the algorithm: the accuracy parameter " in the regression case, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998